AI News and Product Search Page

Type :

AI News
AI Tools
AI Cases
AI Tutorial

2025-02-21 15:58:33.AIbase

Aliyun Modao Launches Two Latest Open Source Multimodal Models - Jump Star

2025-01-28 10:34:39.AIbase

DeepSeek unleashes a new surprise in the late night with the launch of the new multimodal model Janus-Pro

2024-12-18 17:52:23.AIbase

New Breakthrough in Multimodal Models: Fei-Fei Li's Team Unifies Actions and Language, Not Only Understanding Commands but also Reading Implicit Emotions

2024-12-10 08:03:30.AIbase

Zhipu AI Launches Free Multimodal Model GLM-4V-Flash: Enhancing Image Processing Accuracy

Beijing Zhipu Huazhang Technology Co., Ltd. announced that its Zhipu Open Platform BigModel has launched the first free multimodal API—GLM-4V-Flash. This new model leverages the excellent capabilities of the 4V series, achieving improved accuracy in image processing and further lowering the barriers for developers to delve deeper into large models across various fields.

2024-11-30 10:01:37.AIbase

Zhipu AI Open Source End-Side Large Language and Multimodal Model GLM-Edge Series

Zhipu Technology recently announced the open source of its end-side large language and multimodal model GLM-Edge series, marking an important attempt by the company in real-world use cases at the end side. The GLM-Edge series consists of four different model sizes, including GLM-Edge-1.5B-Chat, GLM-Edge-4B-Chat, GLM-Edge-V-2B, and GLM-Edge-V-5B, which are optimized for mobile platforms such as smartphones and vehicle systems, as well as desktop platforms like PCs.

2024-11-19 13:51:41.AIbase

Peking University Team Releases Multimodal Model LLaVA-o1, Inference Capabilities Comparable to GPT-o1!

Recently, research teams from Peking University announced the release of an open-source multimodal model called LLaVA-o1, which is claimed to be the first visual language model capable of spontaneous and systematic reasoning, comparable to GPT-o1. The model excels in six challenging multimodal benchmark tests, with its 11B parameter version outperforming competitors such as Gemini-1.5-pro, GPT-4o-mini, and Llama-3.2-90B-Vision-Instruct.

2024-11-19 09:54:07.AIbase

Mistral Launches the Most Powerful Open Source Multimodal Model Pixtral Large, Upgrading Le Chat to Directly Call Flux Pro

2024-10-25 11:16:59.AIbase

Salesforce AI Research Unveils New Multimodal Model BLIP-3-Video: Cost-Effective Video Understanding

2024-09-27 17:37:02.AIbase

Super Powerful Multimodal Model Emu3: Understanding Images and Videos Through Next Word Prediction

2024-09-26 14:34:11.AIbase

The Open Source Multimodal Model Molmo Can Recognize Objects in Images and Generate Accurate Descriptions

Recently, an open source multimodal AI model named Molmo has drawn widespread attention in the industry. This AI system, based on Qwen2-72B and leveraging OpenAI's CLIP as the visual processing engine, is challenging the dominance of traditional commercial models with its outstanding performance and innovative features. Molmo's standout characteristic is its efficient performance. Despite its relatively small size, it can compete with competitors that are ten times larger in processing capability. This 'small but exquisite' design philosophy not only enhances the model's

2024-08-13 08:15:52.AIbase

Starred Over Ten Thousand! The MiniCPM-V2.6 Model of WallFacer Intelligence Tops GitHub

The latest version 2.6 of WallFacer’s MiniCPM-V series has rapidly climbed to the Top 3 on GitHub and HuggingFace trends, surpassing ten thousand stars. Since its release in February, it has accumulated over a million downloads, becoming a benchmark for on-device model capabilities. MiniCPM-V2.6 achieves performance enhancements for on-device multimodal models with 8 billion parameters, including real-time video understanding, multi-image joint understanding, and multi-image in-context learning, with a quantized backend memory of only 6GB and an inference speed of up to 18 tokens.

2024-08-02 09:04:21.AIbase

Google Launches Powerful Multimodal Model Gemini 1.5 Pro, Outranking GPT-4o and Claude-3.5 Sonnet

Google has released its latest AI masterpiece, Gemini 1.5 Pro, offering an experimental version 0801 through Google AI Studio and the Gemini API. This model leads the LMSYS leaderboard with an ELO score of 1300, surpassing OpenAI's GPT-4o and Anthropic's Claude-3.5 Sonnet. Gemini 1.5 Pro excels in multilingual tasks, mathematics, coding, and visual tasks, featuring a context window of 2 million tokens.

2024-07-31 17:56:44.AIbase

Shusheng · Puyu Lingbi Multimodal Model Upgrade Version 2.5 Supports Longer Contexts and Image-Video Understanding Comparable to GPT-4V

Shusheng · Puyu Lingbi (InternLM-XComposer) Version 2.5 was developed by the Shanghai Artificial Intelligence Laboratory, focusing on long context input and output capabilities, operating smoothly within a length of 96K, and trained with 24K interleaved image-text data. Key upgrades include: high-resolution image understanding, fine-grained video understanding, and multi-turn multi-image dialogue. In application, it can create web pages and write high-quality text-image articles. Evaluations show it surpasses state-of-the-art open-source models across 16 benchmark tests and performs at par with key tasks compared to GPT-4V and Gem.

2024-07-16 10:24:06.AIbase

Meta Unveils Massive Multimodal Model Llama 3 405B on July 23rd

Meta is about to make a big move! They are set to launch an open-source language model called Llama3405B, which is not only their largest model to date but also the largest open-source language model in history. This behemoth, with an astonishing 405 billion parameters, can effortlessly navigate between images and text, completely revolutionizing the old ways that could only handle text.Key Highlights: Meta will release Llama3405B on July 23rd, a multimodal model with 405 billion parameters. Dec

2024-07-14 10:34:47.AIbase

New Breakthrough in Video Understanding! Google Unveils Universal Video Model VideoPrism for Precise Classification, Localization, and Retrieval All in One!

In the world of AI, making machines understand videos is much harder than understanding images. Videos are dynamic, with sound, movement, and a myriad of complex scenes. In the past, AI viewed videos as if they were reading ancient scrolls, often leaving them baffled. But the introduction of VideoPrism might change everything. It's a video encoder developed by Google's research team that can achieve state-of-the-art levels on a variety of video understanding tasks with a single model. Whether it

2024-07-08 11:36:01.AIbase

Translated Title: Kuaishou's Open-Source Image Generation Model Kolors Enables Text Integration into Imagery

Quick Hands released a big move today by opening its in-house image generation model——“Kolors”. This is not an ordinary model; it has been trained on tens of billions of text-image pairs, equipped with a General Language Model (GLM) as a text encoder, supporting bilingual Chinese and English prompts, and can handle contexts up to 256 tokens. Key Features of Kolors:Bilingual Support:Utilizes the General Language Model (GLM) as a text encoder, enabling the model to not only master English but

2024-07-08 09:52:43.AIbase

Keenon AI Unleashes New Features: Web Interface Launched with Head-Tail Frame and Camera Movement Controls

Keling AI has made another big splash recently, not only launching a web version but also releasing several major new features. The launch of the Keling web version, a comprehensive improvement in image quality, the addition of start and end frame and camera control functions, as well as the extended open length for text-to-video up to 10 seconds. Key Updates:Basic Model Upgrade: Keling AI's basic model has undergone significant upgrades, now capable of generating videos with higher resolutions,

2024-07-05 16:36:50.AIbase

Anime Enthusiast's Blessing! Domestic Anime AI YoYo Goes Viral - Customizable 2D Waifus at Will!

In the world of AI, there are always some innovative things that catch one's eye, such as the recently popular domestic animation video AI—YoYo. It allows countless anime enthusiasts to unleash their creativity, making characters that were once only in imagination instantly accessible. Imagine, with just a few taps, you can input a few keywords or upload an image, and YoYo can generate a lifelike anime video for you. This is no longer just a simple animation of images; it is a feast of both vis

2024-07-04 16:07:51.AIbase

Step Stars Unveils Three Models: Step-2 and Beyond, Emphasizing Multimodal Capabilities

In the dazzling galaxy of AI, Leap Star Company has emerged as a shining star with its innovative multimodal models. At the WAIC conference, they showcased three uniquely crafted AI models, highlighting their multimodal capabilities. Step-2: The MoE model with 10 billion parameters is currently available for experience upon application. Step-1.5V, a billion parameter multimodal model, not only demonstrates extraordinary talents in image understanding but also opens up new horizons in video under

2024-07-04 14:31:38.AIbase

New Features Unveiled for Google Pixel 9: AI Integration Brings Intelligent Experience Similar to Microsoft's Recall on the Horizon!

Google is serious this time! According to the latest leak, the Google Pixel 9 series is set to bring a series of eye-catching AI features. Highlight One: Add Me Feature Imagine your group photos always have regrets due to someone blinking or not smiling. Now, with the Add Me feature, these regrets will be a thing of the past. It can capture each person's best expression in a group photo, even merging expressions from different photos into a single group shot, ensuring everyone appears at their b

Search AI Products and News

Explore worldwide AI information, discover new AI opportunities

Aliyun Modao Launches Two Latest Open Source Multimodal Models - Jump Star

DeepSeek unleashes a new surprise in the late night with the launch of the new multimodal model Janus-Pro

New Breakthrough in Multimodal Models: Fei-Fei Li's Team Unifies Actions and Language, Not Only Understanding Commands but also Reading Implicit Emotions

Zhipu AI Launches Free Multimodal Model GLM-4V-Flash: Enhancing Image Processing Accuracy

Zhipu AI Open Source End-Side Large Language and Multimodal Model GLM-Edge Series

Peking University Team Releases Multimodal Model LLaVA-o1, Inference Capabilities Comparable to GPT-o1!

Mistral Launches the Most Powerful Open Source Multimodal Model Pixtral Large, Upgrading Le Chat to Directly Call Flux Pro

Salesforce AI Research Unveils New Multimodal Model BLIP-3-Video: Cost-Effective Video Understanding

Super Powerful Multimodal Model Emu3: Understanding Images and Videos Through Next Word Prediction

The Open Source Multimodal Model Molmo Can Recognize Objects in Images and Generate Accurate Descriptions

Starred Over Ten Thousand! The MiniCPM-V2.6 Model of WallFacer Intelligence Tops GitHub

Google Launches Powerful Multimodal Model Gemini 1.5 Pro, Outranking GPT-4o and Claude-3.5 Sonnet

Shusheng · Puyu Lingbi Multimodal Model Upgrade Version 2.5 Supports Longer Contexts and Image-Video Understanding Comparable to GPT-4V

Meta Unveils Massive Multimodal Model Llama 3 405B on July 23rd

New Breakthrough in Video Understanding! Google Unveils Universal Video Model VideoPrism for Precise Classification, Localization, and Retrieval All in One!

Translated Title: Kuaishou's Open-Source Image Generation Model Kolors Enables Text Integration into Imagery

Keenon AI Unleashes New Features: Web Interface Launched with Head-Tail Frame and Camera Movement Controls

Anime Enthusiast's Blessing! Domestic Anime AI YoYo Goes Viral - Customizable 2D Waifus at Will!

Step Stars Unveils Three Models: Step-2 and Beyond, Emphasizing Multimodal Capabilities

New Features Unveiled for Google Pixel 9: AI Integration Brings Intelligent Experience Similar to Microsoft's Recall on the Horizon!